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Abstract. In the last years, Learning by Imitation (Lbl) has been in¬ 
creasingly explored in order to easily instruct robots to execute complex 
motion tasks. However, most of the approaches do not consider the case 
in which multiple and sometimes conflicting demonstrations are given by 
different teachers. Nevertheless, it seems advisable that the robot does 
not start as a tabula-rasa, but re-using previous knowledge in imitation 
learning is still a difficult research problem. In order to be used in real 
applications, Lbl techniques should be robust and incremental. For this 
reason, the challenge of our research is to find alternative methods for 
incremental, multi-teacher Lbl. 


1 Introduction and Relevance of the Problem 

Over the last decade, robot Learning by Imitation (Lbl) has been increasingly 
explored in order to easily and intuitively instruct robots to execute complex 
tasks. By providing a human-friendly interface for programming by demonstra¬ 
tion, such methods can support the deployment of robotics in domestic and 
industrial environments. Technical intervention of expert users, in fact, would 
be not strictly required and, therefore, the costs for (re)programming a robot 
are drastically reduced. Despite the advantages in terms of flexibility and cost 
reduction, Lbl also brings its own set of problems. For example, understanding 
the focus of the demonstration (“what to imitate”), adapting the demonstration 
to the different embodiment of the robots and obtaining good performances in 
task execution (“how to imitate”) are typical challenges of Lbl. These problems 
have been described and addressed in several ways [1] [2] [3] and a large literature 
exists on the topic. For example, different representations have been proposed 
for encoding learned trajectories or goals, and interactive learning techniques 
have been developed, as in [4], for improving the acquired skills. The common 
assumption behind a large part of work in literature, however, is that demonstra¬ 
tions are provided by a single teacher, in particular a human. This is not always 
the case, because not only a robot could learn from other robots [5] [6] or animals 
(e.g., bio-inspired robots), but also multiple teachers could provide to the robot 
conflicting demonstrations or feedback/advice. Moreover, while only some work 
focused on the incremental learning problem, it is crucial for achieving robot 
autonomy. It seems advisable, in fact, that a robot does not start to learn a 
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Fig. 1 . Schema of the main challenges in Lbl. 

single task every time from scratch, since its knowledge can be augmented for 
executing more complex tasks or for obtaining increasingly better imitations. 
The challenge of our research is to propose a set of solutions for improving Lbl 
techniques, by considering both multiple teachers and incremental learning. In 
contrast to previous work, we will focus our research on learning from multiple 
categories of teachers (e.g., humans, robots, animals). Moreover, we will consider 
classical solutions like reliability measurements and teacher selection as well as 
techniques for strategy co-activation, strategy changing and online refinement 
via contrasting feedback/advice. Sub-skill co-activation will be also adopted for 
improving incremental learning, with the underlying idea of extending current 
non-symbolic approaches to reach higher levels of learning autonomy for hierar¬ 
chical and complex tasks. 

2 Related Work 

As already stated in the previous Section, Learning by Imitation provides a 
high level method for programming a robot which can be easily used by non 
expert users. However, while the effort for providing prior knowledge to the 
robot is drastically reduced, new and different issues emerge. A frequently used 
description of Lbl challenges consists of a set of independent problems presented 
in the form of questions (see Fig. 1): Who to imitate? When to imitate? What 
to imitate? How to imitate?. A huge effort has been done, in previous work, for 
understanding what is relevant for the robot and how it should learn a skill, 
while who and when to imitate are still open challenges. Indeed, only a small 
























amount of work has been done in this direction. A detailed overview over the 
adopted approaches for solving those problems can be found in [7] and [3]. 

One of the first problems when dealing with imitation is to understand how 
to encode a learned skill. While spline representations cannot be easily used for 
encoding a skill, because of their explicit time dependency, many alternatives 
exist. In detail, Hidden Markov models (HMMs) have often been successfully 
applied in this context [8]. Billard et al. [9], for example, use two HMMs, one 
to eliminate signals with high variability and the other one, fully connected, to 
obtain a probabilistic encoding of the task. In the work by Asfour et al. [10], a 
humanoid robot is instructed by using continuous HMMs, trained with a set of 
key points common to almost all the demonstrations. By detecting also temporal 
dependencies between the two arms, dual-arm tasks are successfully executed. 
Calinon et al. [11], instead, use HMMs for representing a joint distribution of 
position and velocity, while generalizing the motion during the reproduction 
through the use of Gaussian Mixture Regression. The approach is validated 
on several robotics platforms. Additional improvements in the generalization of 
movements have been achieved thanks to the use of Gaussian mixture models 
(GMMs). For example, in [12], the authors propose a Lbl framework, based on a 
mixture of Gaussian/Bernoulli distributions, for extracting relevant features of a 
task and generalizing the acquired knowledge to multiple contexts. Chernova and 
Veloso [13], instead, use a representation of the policy based on GMMs in order 
to address the uncertainty of human demonstration. In particular, they propose 
an approach which enables the agent to request demonstrations for specific parts 
of the state space, achieving increasing autonomy in the execution based on the 
analysis of the learned Gaussian mixture set. 

In order to reduce the typically high-dimensional state-action space of those 
problems, a different category of work focus on the representation of tasks as 
a composition of motion primitives. Dynamic Movement Primitives (DMPs), in 
particular, have been proposed by Ijspeert et al. [14] [15] [16] in order to encode 
the properties of the motion by means of differential equations. These primi¬ 
tives, which can take into account perturbations and feedback terms, have been 
successfully applied by Schaal et al. [17], in the context of learning by demon¬ 
stration, on several examples. Ude et al. [18] present a method for generalizing 
periodic DMPs and synthesizing new actions in situations that a robot has never 
encountered before. As an additional example, Stulp and Schaal [19] use DMPs 
for hierarchical learning via Reinforcement Learning (RL) and apply their ap¬ 
proach on a 11-DOF arm plus hand for a pick-and-place task. More recently, an 
alternative representation such as Probabilistic Movement Primitives has been 
proposed by Paraschos et al. [20], which can be used in several applications and 
allows for blending between motions, adapting to altered task variables, and 
co-activating multiple motion primitives in parallel. 

In several work, traditional imitation learning techniques have been associ¬ 
ated with methods for refining the learned policy, as in the case of Nicolescu and 
Mataric [21]. More specific approaches based on Reinforcement Learning enable 
the reduction of the time needed for finding good control policies, while improv- 



ing the performance of the robot (when possible) beyond that of the teacher. 
Guenter and Billard [22], for example, use RL in order to relearn goal-oriented 
tasks even with unexpected perturbations. More in detail, a GMM is used as 
a first trial to reproduce the task and, then, RL is used to adapt the encoded 
speed to perturbations. A limitation of the approach is that the system requires 
to completely relearn the trajectory every time a new perturbation is added. 
Kober and Peters [23] use episodic RL in order to improve motor primitives 
learned by imitation for a Ball-in-a-Cup task. Kormushev et al. [24], instead, 
encode movements with and extension of DMPs initialized from imitation. RL 
is then used for learning the optimal parameters of the policy, thus improving 
the learned capability. A different approach in the same direction, instead, has 
been proposed by Argali et al. [4]. Rather than using traditional Reinforcement 
Learning, in fact, the authors consider the advice of the teacher in order to 
improve the learned policy, by directly applying a correction on the executed 
state-action mapping. 

Such solutions are well suited whenever a robot needs to learn a task from a 
single teacher. However, issues emerge if conflicting demonstrations, or rewards 
in the case of RL, are provided by different teachers (“who to imitate”) by means 
of different sensors and modalities. For non-linear system, in fact, simply averag¬ 
ing the learned trajectories usually results in a new trajectory that is not feasible, 
since it does not obey the constraints of the dynamic model. Preliminary work in 
the direction of addressing this problem has been done by Nicolescu and Mataric 
[21], who propose a topology based method for generalization among multiple 
demonstrations represented as behavior networks. Argali et al. [25] consider the 
incorporation of demonstrations from multiple teachers by selecting among them 
on the basis of their observed reliability. More specifically, reliability is measured 
and represented through a weighted scheme. Babes et al. [26] apply Inverse Re¬ 
inforcement Learning (IRL) [27] to learning from demonstration, by adopting 
a clustering procedure on the observed trajectory for inferring the expert’s in¬ 
tention. This is particularly useful to discriminate among different demonstra¬ 
tions whose underlying goal (and reward function) is not previously or clearly 
specified. Tawani and Billard [28], instead, propose a method based on IRL for 
learning to mimic a variety of experts with different strategies. While provid¬ 
ing a high adaptability, such an approach enables to bootstrap optimal policy 
learning by transferring knowledge from the set of learned policies. Most of this 
approaches, however, neither enable to smoothly switch among different policies, 
when needed, nor consider the opportunity to prioritize among different strate¬ 
gies which are not incompatible. Moreover, teachers are usually considered to 
be human beings, while in real applications demonstrations could be provided 
by arbitrary expert agents, such as other robots [5] [6] or even animals. Addi¬ 
tional work should be focused on the online version of this problem, in which 
contrasting feedback is given to the robot by multiple teachers and refinements 
over different learned policies are required. 

Another limitation of the work in the literature is in the assumption that 
robots need to learn a single task from scratch, without previous knowledge. 



Real world applications, instead, are highly demanding for robots which can 
incrementally acquire new task execution capabilities based on already learned 
skills. A huge effort for dealing with this problem has been done in the direction 
of using symbolic representations of tasks, as in [21]. Pardowitz et al. [29] [30], 
who follow the general approach described in [31], use a hierarchical represen¬ 
tation of complex tasks, generated as a sequence of elementary operators (i.e., 
basic actions, primitives). The method is applied on a robot servant which has 
to learn an everyday household task by combining reasoning and learning. A 
similar approach is used in the work by Ekvall and Kragic [32], who decompose 
each task in sub-tasks which are then used, together with a set of constraints 
and the identified goal, for obtaining generalization. Symbolic representations 
offer, of course, many advantages when dealing with complex tasks, but they 
require a big effort to provide prior knowledge to the robot, resulting in a loss of 
flexibility. Conversely, other work is oriented to the achievement of incremental 
learning from scratch, without the intervention of experts in providing knowl¬ 
edge. Friesen and Rao [33] propose a solution for achieving hierarchical task 
control, by means of an extended Bellman equation. Starting from the equation 
used in [34] for “implicit imitation”, the authors consider both temporally ex¬ 
tended action (called options) and primitives. Such options can execute other 
options. An interesting evolution towards incremental learning can be noticed 
in the work by the research group of Jan Peters [35] [36] [37] [38] [39] [40] [41]. In 
particular, in [39] a general overview of the adopted modular approach is given. 
The authors describe a method for generalizing and learning several motor prim¬ 
itives (building blocks), as well as learning to select and to sequence the building 
blocks for executing complex tasks. Even though this technique represents a huge 
advancement towards incremental learning, the gap between the pure symbolic 
approach and the “numerical” one is still significant. 

3 Methodology and Proposed Solution 

The challenge of this research idea consists in addressing both the problems of 
multi-teaching (robustness) and incremental learning, by starting from the work 
previously presented. With this purpose, state-of-the-art sensing techniques and 
off-the-shelf perception modules will be considered to acquire task demonstra¬ 
tions, since they are not directly related to the considered challenges. 

The general idea of the proposed approach is based on a mixture of techniques 
from Artificial Intelligence and Control Theory. In fact, on the one hand Rein¬ 
forcement Learning has been often explored in combination with traditional Lbl 
for efficient and accurate task reproduction; on the other hand, it has been shown 
that RL is also effective for obtaining bio-inspired and adaptive controllers able 
to find optimal policies, in terms of control cost, on-line [42]. Assume that, for 
each task, n different or contrasting demonstrations are provided to the robot, 
by k different teachers. Each teacher may have his own strategy or may change 
his behavior on the basis of the context. Starting from these, a basic step would 
be the generation of a smaller number n of clusters, in order to reduce the dimen- 





Fig. 2. Graphical presentation of the learning approach. 


sionality of the problem. After dividing the obtained clusters into m sub-parts, 
through a segmentation process, each demonstrated sub-policy {n ■ m) should be 
learned by applying Inverse Reinforcement Learning techniques. Contextually, 
in order to produce a more goal-oriented solution, m general DMPs (one for 
each sub-part) will be continuously refined on the basis of the set of all the n 
demonstrations. A graphical description of the approach is available in Fig. 2. 
At task execution time (on-line), for each sub-part, the robot should be able to 
choose among the different policies and the refined DMPs, on the basis of the 
context or constraints. The choice will strictly depend on the state of the robot 
and on the priority (if available) of the tasks to be executed. Interaction with 
users characterized by different policies will enable a further refinement of the 
adopted policies, as well as a weighting process among the produced solutions, 
based on their given reward. Eventually, in case of non contrasting demonstra¬ 
tions, a priority based execution of co-activated policies will be implemented. 

Intuitively, such a “motion library” will be useful to address two typical issues 
of the incremental learning problem: recognizing in the demonstrations the set 
of already available sub-skills, and reducing the redundancy of task information. 
Based on this, the approach adopted in [39] for combining the building blocks 
in the execution of complex tasks, will be extended to consider co-activated non 
interfering sub-skills, on a priority basis. Moreover, a simple approach, based 
on the extraction of the most relevant features of each sub-task, will be used in 
order to try to partially reduce the gap between the numerical and symbolic rep¬ 
resentations used in Lbl. Contextually, higher level planning will be eventually 
executed by means of Hierarchical Task Networks. 






















Fig. 3. KUKA Youbot robotic platform at the RoCKIn Camp 2014. 


The proposed solutions will be extensively validated on simulated and real 
robots, as well as both in domestic and industrial domains. In particular, the 
whole system will be developed on the Robot Operating System (ROS) 1 frame¬ 
work, since it is very popular. This will enable not only an easy integration 
with realistic simulators like V-REP 2 and Webots 3 , but also a simply transfer¬ 
able implementation for a real robot, like the KUKA Youbot (Fig. 3). Such a 
robotic platform consists of a omnidirectional mobile base and a 5-DOF arm, 
plus the gripper, and it can be considered a good solution for preliminary ex¬ 
periments in this research. Using the Youbot, in fact, allows to experiment Lbl 
in industrial-like scenarios, as in the case of the RoCKIn 4 @Work competitions. 
Due to the robot structure, Lbl implementations on this platform will have to 
take into account the correspondence problem [1]. Note, however, that this is a 
classical issue in the Lbl implementation pipeline, since the embodiment of the 
demonstrator and the one of the robot are usually different, with the exception 
of humanoid robots. Additional tests will be executed on specific simple tasks 
(e.g., door opening and ball throwing), as well as in the context of benchmarking 
activities (e.g., RoCKIn). 


4 Conclusions and Potential Impact 

Producing a robot which can be easily instructed to perform difficult tasks will 
open many business opportunities. In the next years, in fact, industrial and 
general purpose domestic robots will be available to wider communities of non 
expert users. The use of incremental human inspired learning approaches could 

1 http://www.ros.org/ 

2 http://www.coppeliarobotics.com/ 

3 http://www.cyberbotics.com/ 

4 http://rockinrobotchallenge.eu/ 














enable next generation robots to learn from others as well as from their own 
experience. For this reason, we strongly believe that an intuitive multi-teaching 
“interface” for robots could improve not only the overall quality of the user 
experience and the robot usability, but also the acceptance of robots in our 
society. We also think that exploring robust and incremental Lbl methods could 
have a long-term positive impact from an economical point of view. Consider, for 
example, the effort in terms of money spent by big companies for programming 
robots: industries could save a lot of money, having the possibility to easily 
reprogram, or improve with the advice of different teachers, a single part of 
the task that a robot has to execute. For this reason, developed algorithms 
could be included in ROS Industrial 5 , whose goal is to transfer the advances in 
robotics research to concrete applications, with economical potential. From an 
academic point of view, the interest towards human movement understanding is 
increasing 6 , and improvements in Lbl could have a strong impact in this area, 
since it is strictly related to natural movement and specific motion dynamics. 
In conclusion, we believe that research in this area can be further extended 
towards practical applications and real world scenarios, but we are aware that 
this document represents only the starting point for a detailed analysis and 
investigation of the possible techniques for approaching robust and incremental 
Lbl. 
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